BANK OF ENGLISH AND BEYOND Hand-crafted parsers for functional annotation
نویسندگان
چکیده
The 200 million word corpus of the Bank of English was annotated morphologically and syntactically using the English Constraint Grammar analyser, a rulebased shallow parser developed at the Research Unit for Computational Linguistics, University of Helsinki. We discuss the annotation system and methods used in the corpus work, as well as the theoretical assumptions of the Constraint Grammar syntax. Based on our experience in large-scale corpus work, we argue for a deeper and more explicit, dependency-based syntactic representation. We present a new practical parsing system, the Functional Dependency Grammar parser, developed from the Constraint Grammar system, and discuss its suitability for treebank annotation.
منابع مشابه
Automatic detection of English inclusions in mixed-lingual text with an application to parsing
The influence of English continues to grow to the extent that its expressions have begun to permeate the original forms of other languages. It has become more acceptable, and in some cases fashionable, for people to combine English phrases with their native tongue. This language mixing phenomenon typically occurs initially in conversation and subsequently in written form. In fact, there is evid...
متن کاملBeyond Skeleton Parsing: Producing a Comprehensive Large-Scale General-English Treebank With Full Grammatical Analysis
A treebank is a body of natural language text which has been grammatically annotated by hand, in terms of some previously-established scheme of grammatical analysis. Treebanks have been used within the field of natural language processing as a source of training data for statistical part og speech taggers (Black et al., 1992; Brill, 1994; Merialdo, 1994; Weischedel et al., 1993) and for statist...
متن کاملGeneralized Higher-Order Dependency Parsing with Cube Pruning
State-of-the-art graph-based parsers use features over higher-order dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transition-based dependency parsers can easily utilize such features without increasing the linear complexity of the shift-reduce system beyond a constant. In this paper, we attempt to address this imbalance for graph-bas...
متن کاملStructural metadata annotation: moving beyond English
The goal of metadata extraction (MDE) is to enable technology that can take raw speech-to-text output and refine it into forms that are more useful to humans and to downstream automatic processes. Starting in 2003, a structural metadata annotation task was defined for English as part of the DARPA EARS Program. A significant new challenge for MDE is the addition of new languages. This paper repo...
متن کاملMobile, L2 vocabulary learning, and fighting illiteracy: A case study of Iranian semi-illiterates beyond transition level
As mobile learning simultaneously employs both handheld computers and mobile telephones and other devices that draw on the same set of functionalities, it throws open the door for swift connection between learners and teachers. This study examined and articulated the impact of the application of mobile devices for teaching English vocabulary items to 123 Iranian semi-illitera...
متن کامل